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1.0  INTRODUCTION 

For  a  sensible  application  of  Augmented  Reality  (AR)  and  Virtual  Environments  (VE)  it  is  necessary  to 
include  basic  human  information  processing  resources  and  characteristics.  Because  there  is  no  fully 
functional  model  of  human  perceptual,  cognitive,  and  motor  behavior,  this  requires  empirical  analyses. 
Moreover,  these  analyses  are  often  based  on  subjective  ratings  rather  than  objective  measures.  With  regard 
to  perception  as  the  basic  sensation  of  synthetic  environments,  each  modality  should  be  analyzed 
separately.  There  are  special  limits  of  human  perception  which  limit  the  transfer  of  information  of  might 
even  lead  to  unwanted  negative  effects  or  after-effects  when  not  taken  into  consideration.  One  example  for 
this  is  long  exposition  times  and  emotional  inclusion  of  the  user.  They  may  even  cause  a  user’s  isolation 
from  the  “real”  daily  life.  In  addition  to  a  purely  short-term,  technological  sight,  it  is  necessary  to  evaluate 
the  application  of  AR  and  VE  in  terms  of  its  psychological  and  sociological  impact. 

Aspects  of  visual  feedback  are  very  important  because  of  the  dominance  of  the  visual  modality. 
The  usability  of  the  display  is  an  important  factor  for  the  user’s  willingness  and  compliance  to  spend  long 
times  immersed  in  the  virtual  world.  For  example,  HMDs  need  not  to  be  too  heavy,  too  large  or  too  tightly 
fit.  This  category  of  factors  groups  the  General  Ergonomic  Factors.  The  second  category  deals  with 
Physiological  Factors  influencing  vision.  They  subsume,  e.g.,  graphics  refresh  rate,  depth  perception  and 
lighting  level  influencing  human  performance  with  a  VE  display  systems.  One  example  is  that  more  than 
25  images  per  second  in  a  dark  environment  cause  the  illusion  of  a  continuous  motion  rather  than  single 
flickering  images.  However,  the  graphics  refresh  rates  depends  on  the  scene  complexity  expressed  in 
number  of  polygons  and  shaded  modality  and  not  only  on  update  rate  of  the  display  device  itself.  The  third 
category  of  factors  deals  with  Psychological  Factors  such  as  scene  realism,  scene  errors  (scale  errors, 
translation  errors,  etc.)  and  the  integration  of  feedback  and  command.  It  refers  to  the  modification  of  the 
scene  as  a  function  of  task-specific  information.  Markers  or  additional  functionality  can  be  added  to  the 
virtual  world,  which  should  help  the  user  in  performing  several  tasks.  An  example  is  an  “intelligent  agent” 
or  tutor  who  serves  as  a  figurative,  anthropomorphic  representation  of  the  system  status. 

Acoustic  feedback  has  a  dual  role.  First,  it  is  the  medium  for  transmitting  information.  Second,  it  can  be  used 
to  localize  the  source  of  the  information.  Ergonomic  factors  refer  to  the  design  of  the  hardware  and  its  ease 
of  use  by  humans.  Physiological  conditions  refer  to  the  sound  frequency  range  which  has  to  be  within  the 
range  of  audible  sound  (20  to  20.000  Hz)  and  sound  intensity.  If  the  intensity  is  too  strong,  it  can  produce 
discomfort  or  even,  above  120  db,  pain.  Another  factor  is  the  sound/noise  ratio.  A  more  complex  area  is 
described  by  psychological  factors.  Sound  perception  and  processing  allows  the  mental  reconstruction  of  a 
world  that  is  volumetric  and  whose  parts  have  specific  conceptual  components.  A  piano,  for  example,  should 
not  generate  drum  sound.  Another  example  is  a  complex  control  panel,  which  concludes  a  large  amount  of 
visual  feedback.  An  audio  alarm  can  raise  the  user’s  attention  to  error  conditions.  Finally,  sound  or  speech 
recognition  can  also  be  used  as  another,  very  natural  input  modality  of  the  user. 

Physical  contact  with  the  environment  provides  another  important  feedback.  Some  virtual  tasks,  especially 
manual  manipulation,  can  only  be  performed  accurate  by  adding  tactile  feedback  to  the  environment. 
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But  this  is  often  difficult.  The  aim  for  the  future  is  to  provide  touch  and  force  feedback  to  the  whole  body. 
Today,  haptic  feedback  stimulation  is  usually  restricted  to  one  hand  only.  Fortunately,  many  real  tasks  can 
be  carried  out  like  this.  Therefore,  this  restriction  does  no  degrade  human  performance. 

Long  immersion  into  a  synthetic  environment  is  likely  to  cause  several  severe  effects.  Simulation  sickness, 
resulting  into  dizziness,  nausea,  and  disorientation  is  thought  to  be  caused  by  a  sensorial  conflict  between 
visual  feedback  indicating  motion  and  the  kinesthetic  cuing.  The  phenomenon  is  aggravated  by  poor 
image  resolution. 

Factors  which  have  been  identified  as  contributors  to  simulator  sickness  in  virtual  environment  systems  are 
shown  in  the  following  Table  (Frank  et  al.,  1983;  Kennedy  et  al.,  1989;  Kolasinski,  1995;  Pausch  et  al., 
1992).  These  are  divided  into  characteristics  of  the  user,  the  system  and  the  user’s  task.  Few  systematic 
studies  have  been  carried  out  to  determine  the  effects  of  the  characteristics  of  virtual  environment  systems  on 
the  symptoms  of  simulator  sickness.  Flence  much  of  the  evidence  for  the  effects  of  these  factors  comes  from 
studies  of  visually-induced  motion  sickness  and  motion-induced  sickness  (i.e.,  sickness  caused  by  actual 
vehicle  motions),  as  well  as  the  effects  of  exposures  to  simulators. 


Table  2-1:  Factors  Contributing  to  Simulator  Sickness 
in  Virtual  Environments  (Kennedy  et  al.  1989) 


User  Characteristics 

System  Characteristics 

Task  Characteristics 

Physical  Characteristics 

Display 

Movement  through  Virtual 

Age 

Contrast 

Environment 

Gender 

Flicker 

Control  of  movement 

Ethnic  origin 

Luminance  level 

Speed  of  movement 

Postural  stability 

Phosphor  lag 

State  of  health 

Pefresh  rate 

Visual  Image 

Pesolution 

Field  of  view 

Experience 

Scene  content 

With  virtual  reality  system 

System  Lags 

Vection 

With  corresponding  real-world 

Time  lag 

Viewing  region 

Task 

Update  rate 

Visual  flow 

Perceptual  Characteristics 

Interaction  with  Task 

Flicker  fusion  frequency 

Duration 

Mental  rotation  ability 

Head  movements 

Perceptual  style 

Sitting  vs.  standing 

2.0  USER  CHARACTERISTICS 

Physical  Characteristics:  Age  has  been  shown  to  affect  susceptibility  to  motion-induced  motion  sickness. 
Motion  sickness  susceptibility  occurs  most  often  for  people  between  ages  of  2  and  12  years.  It  tends  to 
decrease  rapidly  from  the  age  of  12  to  21  years  and  then  more  slowly  through  the  remainder  of  life 
(Reason  and  Brand,  1975). 

Females  tend  to  be  more  susceptible  to  motion  sickness  than  males.  The  differences  might  be  due  to 
anatomical  differences  or  an  effect  of  hormones  (Griffin,  1990).  In  a  study  on  the  occurrence  of  sea¬ 
sickness  on  a  ship,  vomiting  occurred  among  14.1%  of  female  passengers,  but  only  8.5  %  of  male 
passengers  (Lawther  and  Griffin,  1986).  As  seasickness  is  another  motion-induced  sickness,  gender  effects 
are  likely  to  exist  for  simulator  sickness  as  well. 
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Ethnic  origin  may  affect  susceptibility  to  visually-induced  motion  sickness.  Stem  et  al.  (1993)  have 
presented  experimental  evidence  to  show  that  Chinese  women  may  be  more  susceptible  than  European- 
American  or  African-American  women  to  visually-induced  motion  sickness.  A  rotating  optokinetic  drum 
was  used  to  provoke  motion  sickness.  The  Chinese  subjects  showed  significantly  greater  disturbances  in 
gastric  activity  and  reported  significantly  more  severe  motion  sickness  symptoms.  It  is  unclear  whether 
this  effect  is  caused  by  cultural,  environmental,  or  genetic  factors. 

Postural  stability  has  been  shown  to  be  affected  by  exposure  to  virtual  environments  and  simulators 
(Kennedy  et  al.,  1993,  1995).  Kolasinski  (1995)  has  presented  evidence  to  show  that  less  stable 
individuals  may  be  more  susceptible  to  simulator  sickness.  Pre-simulator  postural  stability  measurements 
were  compared  with  post-simulator  sickness  data  in  Navy  helicopter  pilots.  Postural  stability  was  found  to 
be  associated  with  symptoms  of  nausea  and  disorientation,  but  not  with  ocular  disturbances. 

The  state  of  health  of  an  individual  may  affect  susceptibility  to  simulator  sickness.  It  has  been 
recommended  that  individuals  should  not  be  exposed  to  virtual  environments  when  suffering  from  health 
problems  including  flu,  ear  infection,  hangover,  sleep  loss  or  when  taking  medications  affecting  visual  or 
vestibular  function  (Frank  et  al.,  1983;  Kennedy  et  al.,  1987,  1993;  McCauley  and  Sharkey,  1992).  Regan 
and  Ramsey  (1994)  have  shown  that  drugs  such  as  hy cosine  hydrobromide  can  be  effective  in  reducing 
symptoms  of  nausea  (as  well  as  stomach  awareness  and  eyestrain)  during  immersion  in  VE. 

Experience :  Nausea  and  postural  problems  have  been  shown  to  be  reduced  with  increased  prior  experience 
in  simulators  (Crowley,  1987)  and  immersive  VEs  (Regan,  1995).  Frank  et  al.  (1983)  have  suggested  that 
although  adaptation  reduces  symptoms  during  immersion,  re-adaptation  to  the  normal  environment  could 
lead  to  a  greater  incidence  of  post-immersion  symptoms.  Kennedy  et  al.  (1989)  have  also  suggested  that 
adaptation  cannot  be  advocated  as  the  technological  answer  to  the  problem  of  sickness  in  simulators  since 
adaptation  is  a  form  of  learning  involving  acquisition  of  incorrect  or  maladaptive  responses.  This  would 
create  a  larger  risk  of  negative  training  transfer  for  individuals.  For  instance,  pilots  with  more  flight 
experience  may  be  generally  more  prone  to  simulator  sickness  (Kennedy  et  al.,  1987).  This  may  be  due  to 
their  greater  experience  of  flight  conditions,  leading  to  greater  sensitivity  to  discrepancies  between  actual 
and  simulated  flight.  Another  reason  might  be  the  smaller  degree  of  control  when  acting  as  instructors  in 
simulators  (Pausch  et  al.,  1992). 

Perceptual  Characteristics:  Perceptual  characteristics  which  have  been  suggested  to  affect  susceptibility 
to  simulator  sickness  include  perceptual  style,  or  field  independence  (Kennedy,  1975;  Kolasinski,  1995), 
mental  rotation  ability  (Parker  and  Harm,  1992),  and  level  of  concentration  (Kolasinski,  1995). 


3.0  SYSTEM  CHARACTERISTICS 

Characteristics  of  the  Display :  Luminance,  contrast  and  resolution  should  be  balanced  with  the  task  to  be 
performed  in  order  to  achieve  optimum  performance  (Pausch  et  al.,  1992).  Low  spatial  resolution  can  lead 
to  problems  of  temporal  aliasing,  similarly  to  low  frame  rates  (Edgar  and  Bex,  1995). 

Flicker  of  the  display  has  been  cited  as  a  main  contributor  to  simulator  sickness  (Frank  et  al.,  1983; 
Kolasinski,  1995;  Pausch  et  al.,  1992).  It  is  also  distracting  and  contributes  to  eye  fatigue  (Pausch  et  al., 
1992).  Perceptible  flicker,  i.e.,  the  flicker  fusion  frequency  threshold,  is  dependent  on  the  refresh  rate, 
luminance  and  field-of-view.  As  the  level  of  luminance  increases,  the  refresh  rate  must  also  increase  to 
prevent  flicker.  Increasing  the  field-of-view  also  increases  the  probability  of  perceiving  flicker  because  the 
peripheral  visual  system  is  more  sensitive  to  flicker  than  the  fovea.  There  is  a  wide  range  of  sensitivities  to 
flicker  between  individuals,  and  also  a  daily  variation  within  individuals  (Boff  and  Lincoln,  1988). 
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Other  visual  factors,  which  contribute  to  oculomotor  symptoms  reported  during  exposure  to  virtual 
environments,  have  been  discussed  extensively  by  Mon-Williams  et  al.  (1993),  Regan  and  Price  (1993) 
and  Rushton  et  al.  (1994). 

System  Lags  and  Latency.  Wioka  (1992)  has  suggested  that  lags  of  less  than  300  ms  are  required  to 
maintain  the  illusion  of  immersion  in  a  VE,  because  otherwise  subjects  start  to  dissociate  their  movements 
from  the  associated  image  motions  (Wioka,  1992;  Held  and  Durlach,  1991).  It  is  unclear  whether  the 
authors  attribute  these  effects  to  pure  lags  or  the  system  update  rates.  However,  lags  of  this  magnitude, 
and  update  rates  of  the  order  of  3  frames  per  second,  have  both  been  shown  to  have  large  effects  on 
performance  and  on  subjects’  movement  strategies.  The  total  system  lag  in  the  VE-system  used  in  the 
experimental  studies  reported  by  Regan  (1995)  and  Regan  and  Price  (1994)  was  reported  to  be  300  ms 
(Regan  and  Price,  1993  c). 

There  is  an  urgent  need  for  further  research  to  systematically  investigate  the  effect  of  a  range  of  system 
lags  on  the  incidence  of  simulator  sickness  symptoms.  The  interaction  between  system  lags  of  head 
movement  velocity  is  likely  to  be  important,  since  errors  in  the  motion  of  displayed  images  are 
proportional  to  both  total  lag  and  head  velocity. 

Previous  studies  considering  hand-  and  head-movements  show  that  users  are  very  sensitive  to  latency 
changes.  Subjects  were  able  to  detect  latency  changes  with  a  PSE  of  ~50  ms  and  a  JND  of  ~8  -  15  ms, 
respectively  (Ellis  et  al.,  1999a;  Ellis  et  al.  1999b).  When  examining  random  vs.  paced  head-movements 
PSEs  of  -59  ms  and  HMDs  of  -13.6  ms  were  determined  (Adelstein  et  al.,  2003).  The  same  values  are 
determined  with  changing  visual  condition  (background,  foreground)  or  realism  of  the  VE  (Mania  et  al., 
2004;  Ellis  et  al.,  2004).  Pausch  (1992)  cites  data  from  Westra  and  Lintem  (1985)  to  show  that  lags  may 
affect  subjective  impressions  of  a  simulator  even  stronger  than  they  affect  performance.  Simulated 
helicopter  landings  were  compared  with  visual  lags  of  117  ms  and  217  ms.  Only  a  small  effect  on 
objective  performance  measures  occurred,  but  pilots  believed  that  the  lag  had  a  larger  effect  than  was 
indicated  by  the  performance  measures. 

Richard  et  al.  (1996)  suggested  that  the  frame  rate  (i.e.,  the  maximum  rate  at  which  new  virtual  scenes  are 
presented  to  the  user)  is  an  important  source  of  perceptual  distortions.  Low  frame  rates  make  objects 
appear  to  move  in  saccades  (discrete  spatial  jumps).  Thus,  the  visual  system  has  to  bridge  the  gaps 
between  perceived  positions  by  using  spatio-temporal  filtering.  The  resulting  sampled  motion  may  also 
result  in  other  artifacts  such  as  motion  reversals  (Edgar  and  Bex,  1995).  Low  frame  rates  (particularly 
when  combined  with  high  image  velocities)  may  cause  the  coherence  of  the  image  motion  to  be  lost,  and  a 
number  of  perceptual  phenomena  may  occur,  including  appearance  of  reversals  in  the  perceived  motion 
direction,  motion  appearing  jerky,  and  multiple  images  trailing  behind  the  target.  This  phenomenon  is 
referred  to  as  temporal  aliasing.  Edgar  and  Bex  (1995)  discuss  methods  for  optimizing  displays  with  low 
update  rates  to  minimize  this  problem. 


4.0  TASK  CHARACTERISTICS 

Movement  through  the  Virtual  Environment:  The  degree  of  control  of  the  motion  affects  general  motion- 
induced  sicknesses  and  simulator  sickness.  The  incidence  of  simulator  sickness  among  air-crew  has  been 
reported  to  be  lower  in  pilots  (who  are  most  likely  to  generate  control  inputs)  than  in  co-pilots  or  other 
crew  members  (Pausch  et  al.,  1992). 

The  speed  of  movement  through  a  virtual  environment  determines  global  visual  flow,  i.e.,  the  rate  at 
which  objects  flow  through  the  visual  scene.  The  rate  of  visual  flow  influences  vection  and  is  related  to 
simulator  sickness  (McCauley  and  Sharkey,  1992).  Other  motion  conditions  that  have  been  observed  to 
exacerbate  sickness  in  simulators  include  tasks  involving  high  rates  of  linear  or  rotational  acceleration, 
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unusual  maneuvers  such  as  flying  backwards  and  freezing,  or  resetting  the  simulation  during  exposures 
(McCauley  and  Sharkey,  1992). 

Regan  and  Price  (1993  c)  have  suggested  that  the  method  of  movement  through  the  virtual  world  affects 
the  level  of  side-effects.  Experiments  to  investigate  side-effects  in  immersive  VE  have  utilized  a 
3D  mouse  to  generate  movement  (Regan,  1995;  Regan  and  Price,  1993c,  1994;  Cobb  et  al.,  1995).  This  is 
likely  to  generate  conflict  between  visual,  vestibular  and  somatosensory  senses  of  body  movement. 
A  more  natural  movement  might  be  provided  by  coupling  movement  through  a  virtual  environment  to 
walking  on  a  treadmill  (Regan  and  Price,  1993c). 

Visual  Image'.  A  wider  field-of-view  may  enhance  performance  in  a  simulator,  but  also  increase  the  risk  of 
simulator  sickness  (Kennedy  et  al.,  1989;  Pausch  et  al.,  1992).  This  happens  although  the  effect  of  field  of 
view  is  often  confounded  with  other  factors  (Kennedy  et  al.,  1989).  Stem  et  al.  (1990)  have  shown  that 
restricting  the  width  of  the  visual  field  to  1 5  degrees  significantly  reduces  both.  Circular  vection  and  the 
symptoms  of  motion  sickness  induced  by  a  rotating  surround  with  vertical  stripes  (optokinetic  drum). 
Fixation  on  a  central  point  in  the  visual  field  also  reduces  the  circular  vection  induced  by  rotating  stripes 
observed  with  peripheral  vision,  and  greatly  reduces  motion  sickness  symptoms  (Stem  et  al.,  1990). 
Circular  vection  increases  with  increasing  stimulus  velocity  up  to  about  90  degrees  per  second  (Boff  and 
Lincoln,  1988).  Further  increases  in  stimulus  velocity  may  inhibit  the  illusion.  Vection  is  not  dependent  on 
acuity  or  luminance  (down  to  scoptopic  levels)  (Liebowitz  et  al.,  1979). 

Linear  vection  can  be  induced  visually  by  expanding  pattern  of  texture  points.  Anderson  and  Braunstein 
(1985)  showed  that  linear  vection  could  be  induced  by  a  moving  display  of  radial  expanding  dots  with  a 
visual  angle  as  small  as  7.5°  in  the  central  visual  field.  They  suggested  that  the  type  of  motion  and  the 
texture  in  the  display  may  be  as  important  as  the  field-of-view  in  inducing  vection.  The  incidence  of 
simulator  sickness  has  been  shown  to  be  related  to  the  rate  of  global  visual  flow,  or  the  rate  at  which 
objects  flow  through  the  visual  scene  (McCauley  and  Sharkey,  1992).  The  direction  of  self-motion  can  be 
derived  from  the  motion  pattern  of  texture  points  in  the  visual  field  (Warren,  1976;  Zacharias  et  al.,  1985). 
The  optical  flow  field  appears  to  expand  from  a  focal  point,  which  indicates  the  direction  of  motion. 
For  curved  motion  the  expanding  flow  field  tends  to  bend  sideways,  and  the  focal  point  is  no  longer 
defined.  Gmnwald  et  al.  (1991)  have  shown  how  unwanted  image  shifts,  which  are  due  to  lags  in  a  flight 
simulator  with  a  head-coupled  head-mounted  display,  distort  the  visual  flow  field.  In  straight  and  level 
flight,  the  unwanted  image  motions  which  occur  during  head  movements  will  cause  the  expanding  visual 
pattern  to  appear  to  bend,  creating  the  illusion  of  a  curved  flight  path.  The  bending  effect  is  proportional  to 
the  ratio  of  the  magnitude  of  the  image  shifts  and  the  apparent  velocity  along  the  line  of  sight. 
The  apparent  velocity  depends  on  the  velocity  to  height  ratio.  Flence  the  angular  errors  induced  by  the 
bending  effect  increase  with  decreased  velocity  and  increased  altitude. 

Linear  vection  has  been  observed  to  influence  postural  adjustments  made  by  subjects  in  the  forward  and 
rear  direction.  Lestienne  et  al.  (1977)  observed  inclinations  of  subjects  in  the  same  direction  as  the 
movement  of  the  visual  scene  movement,  with  a  latency  of  1  to  2.5  s,  and  an  after-effect  on  the  cessation 
of  motion.  The  amplitude  of  the  postural  adjustments  was  proportional  to  the  image  velocity. 

Interaction  with  the  Task :  Short  exposure  duration  of  less  than  10  minutes  to  immersive  virtual 
environments  has  already  been  shown  to  result  in  significant  incidences  of  nausea,  disorientation  and 
ocular  problems  (Regan  and  Price,  1993c).  Longer  exposures  to  virtual  environments  can  result  in  an 
increased  incidence  of  sickness  and  require  longer  adaptation  periods  (McCauley  and  Sharkey,  1992). 
The  severity  of  motion-induced  sickness  symptoms  have  been  shown  to  increase  with  the  duration  of 
exposure  to  the  provocation  for  duration  up  to  at  least  6  hours  (Lawther  and  Griffin,  1986).  Kennedy  et  al. 
(1993)  reported  that  longer  exposures  to  simulated  flight  increased  the  intensity  and  duration  of  postural 
disruption. 
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The  extent  of  image  position  errors,  and  conflicts  between  visual  and  vestibular  motion  cues,  will  depend 
on  the  interaction  between  head  motions  and  the  motions  of  visual  images  on  the  display.  Head 
movements  in  simulators  have  been  reported  to  be  very  provocative  (Lackner,  1990,  reported  by  Pausch 
et  al.,  1992).  However  Regan  and  Price  (1993c)  found  that  over  a  ten  minute  period  of  immersion  in  a 
virtual  environment,  there  was  no  significant  effect  of  type  of  head  movement  on  reported  levels  of 
simulator  sickness.  Sickness  incidence  was  compared  between  two  ten  minute  exposures  to  an  immersive 
virtual  environment.  One  exposure  involved  pronounced  head  movements  and  rapid  interaction  with  the 
system.  During  the  other  exposure,  subjects  were  able  to  control  their  head  movements  and  their  speed  of 
interaction  to  suit  them.  There  was  some  evidence  that  the  pronounced  head  movements  initially  caused 
higher  levels  of  symptoms,  but  that  subjects  adapted  to  the  conditions  by  the  end  of  the  exposures. 
No  measurements  were  made  of  head  movements,  so  the  effect  of  the  instructions  given  to  the  subjects  on 
the  velocity  and  duration  of  head  movements  is  unclear.  The  system  lag  was  reported  to  be  300  ms, 
so  even  slow  head  movements  may  have  been  expected  to  result  in  significant  spatio-temporal  distortions. 
The  authors  suggest  an  urgent  need  for  further  research  to  systematically  investigate  the  interaction 
between  system  lags  and  head  movement  velocity  with  the  incidence  of  side-effects. 

The  levels  of  symptoms  reported  by  seated  subjects  after  immersion  in  a  virtual  environment  have  been 
reported  to  be  slightly  higher  than  the  level  of  symptoms  reported  by  standing  subjects  (Regan  and  Price, 
1993c).  However,  the  differences  were  not  statistically  significant  after  ten  minute  exposures. 

The  European  Telecommunications  Standards  Institute  has  published  several  reports  about  Human  Factors 
in  many  areas  of  computer  science.  In  ETSI  (2002)  guidelines  for  the  design  and  use  of  multimodal 
symbols  is  presented.  It  provides  a  study  of  the  needs  and  requirements  for  the  use  of  multimodal  symbols 
in  user  interfaces,  which  can  be  also  adapted  to  VE. 


5.0  PERCEPTUAL  REQUIREMENTS 

5.1  Visual  Requirements 

Most  environmental  information  is  gained  through  the  visual  modality.  The  physiology  of  eye  determines 
limitations  and  requirements  for  displaying  information  on  a  computer  display.  With  current  technology  a 
faster  presentation  of  information  is  possible  than  perception  and  processing  of  the  information  by  the 
human.  Therefore,  Human-Computer-Interaction  is  mainly  caused  by  the  human  operator  and  not  the 
computer. 

Basic  visual  perception  starts  with  a  projection  of  the  image  of  the  environment  onto  the  retina.  Special 
photoreceptors  transform  the  visual  stimuli  into  electronic  stimuli.  There  are  two  different  types  of 
photoreceptors  on  the  retina  which  are  commonly  referred  to  as  “rods”  and  “cones”.  Rods  are  sensitive  to 
light,  but  saturate  at  high  levels  of  illumination  whereas  cones  are  less  sensitive,  but  can  operate  at  higher 
luminance  levels  (Monk,  1984).  Rods  occur  predominantly  near  the  fovea,  or  focal  point  of  the  eye  image 
and  the  cones  are  more  predominant  around  the  periphery.  This  results  into  a  relatively  small  angle  of 
view  for  clear  and  shaip  images  with  a  size  of  1  or  2  degrees  only.  With  growing  angles,  shaipness 
decreases  rapidly.  Consequently,  information  should  be  displayed  within  this  small  angle.  Otherwise  the 
eye  has  to  moving  continuously  in  order  to  catch  a  complete  glimpse.  For  a  complete  overview  additional 
cognitive  resources  are  required  to  assimilate  the  single  views  into  a  complete  mental  page. 
In  combination  with  the  capacity  of  short  term  memory  this  allows  only  a  small  amount  of  information 
that  can  be  displayed  on  a  single  screen. 

The  eye’s  ability  to  distinguish  color,  luminance,  contrast  and  brightness  is  another  factor  that  has  to  be 
considered.  The  color  of  an  object  is  determined  by  the  frequency  of  the  light  that  is  reflected  from  it. 
The  visible  spectrum  reaches  from  blue  at  300  nm  to  red  at  700nm.  Different  colors  are  obtained  through 
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combinations  of  wavelengths  throughout  this  wavelength  range.  Color  sensitivity  is  created  by  the 
existence  of  three  different  types  of  cones  in  the  eye:  blue,  green,  and  red.  Each  type  of  cone  responds  to  a 
certain,  not  exact,  range  of  wavelengths.  By  combining  wavelengths,  the  human  eye  can  distinguish  more 
than  8,000  different  colors  (Monk,  1984).  Approximately  8%  of  the  male  population  and  less  than  1%  of 
the  female  population  suffer  from  color  blindness  to  some  degree.  Color  blindness  is  the  inability  to 
distinguish  certain  colors,  notably  reds  and  greens.  This  fact  is  also  important  to  remember  when 
designing  visual  displays  for  a  larger  user  group. 

Luminance  is  a  measure  of  the  amount  of  light  reflected  from  a  surface.  It  is  determined  by  the  amount  of 
light  that  shines  on  an  object  and  the  reflectance  of  the  surface  of  the  object.  Its  unit  of  measure  is  Candela 
per  square  Metre  (cd/m2).  Research  has  determined  that  there  is  a  range  of  optimal  luminance  levels  and 
that  low  illumination  can  be  a  hindrance  to  an  otherwise  good  HCI. 

Contrast  is  defined  as  the  difference  between  the  luminance  of  an  object  and  its  background  divided  by  the 
luminance  of  the  background  (Downton,  1991).  It  is  a  measure  of  an  eye’s  ability  to  distinguish 
foreground  from  background  easily.  A  bright  background  with  black  writing  has  a  low  luminance  for  the 
writing  and  a  high  luminance  for  the  background.  This  screen  therefore,  has  a  negative  contrast. 
The  higher  the  absolute  value  of  the  contrast  the  easier  it  is  to  distinguish  objects. 

Brightness  is  usually  thought  of  as  a  subjective  property  of  light.  It  depends  on  many  factors.  The  main 
one  is  comparative  illumination.  A  cloudy  day  may  seem  quite  dull.  The  same  day  would  be  quite  bright  if 
you  were  just  emerging  from  a  dark  room.  Brightness  contrast  can  cause  several  common  optical  illusions 
as  well. 

5.2  Special  Visual  Issues 

There  are  several  other  issues  which  have  to  be  considered  when  designing  visual  output.  They  are  based 
on  characteristics  and  deficits  of  human  visual  perception. 

5.2.1  Eye  Dominance 

The  majority  of  people  have  a  distinct  preference  for  one  eye  over  the  other.  This  is  typically,  quickly, 
and  easily  found  through  sighting  tests  (Peli,  1990).  This  eye  dominance  has  shown  only  a  limited 
performance  advantage  in  military  targeting  tasks  (Verona,  1980).  Yet,  the  dominate  eye  will  be  less 
susceptible  to  suppression  in  binocular  rivalry  and  this  likelihood  of  suppression  will  further  decrease  over 
time. 

An  estimated  60%  of  the  population  is  right  eye  dominant.  Subsequently,  it  is  evident  that  eye  dominance 
does  not  correspond  with  users  being  left  or  right  handed  as  only  1 0%  of  the  population  is  left  handed. 

5.2.2  Pupil  Adaptation 

For  controlling  the  amount  of  light  entering  the  eye,  the  pupil  will  constrict  (reducing  the  amount  of  light)  or 
dilate  (letting  more  light  in).  When  the  illumination  is  suddenly  increased,  the  pupil  will  overcompensate  by 
constricting  and  then  dilating  slowly  to  match  the  light  level.  After  reducing  the  illumination  the  pupil 
cycles  through  several  dilations  and  constrictions.  Complete  constriction  may  take  less  than  one  minute, 
but  complete  dilation  may  take  over  20  minutes  (Alpem  and  Campbell,  1963).  This  is  caused  partially  by 
the  fact  that  the  cones  (responsible  for  color  perception)  recover  more  quickly  than  rods  (which  are 
responsible  for  night  vision),  but  have  lower  sensitivity.  The  size  of  the  pupil  will  decrease  once  a  target 
gets  closer  than  I  meter  away  (Alpem  and  Campbell,  1963).  This  is  very  likely  due  to  the  increase 
luminance  caused  by  the  light  reflected  off  the  target. 
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5.2.3  Visual  Field 

The  visual  field  (the  area  the  eye  can  perceive)  is  roughly  60  degrees  above  and  below  the  center  and 
slightly  over  90  degrees  to  the  outside  (and  60  degrees  the  inside  for  each  eye,  where  it  is  partially  blocked 
by  the  nose).  The  lateral  visual  field  slowly  declines  with  age.  At  the  age  of  20  it  has  a  size  of  nearly 
180  degrees  horizontally.  At  the  age  of  80  it  is  reduced  to  135  degrees.  Women  have  slightly  larger  visual 
fields  then  men,  primarily  due  to  differences  of  the  nasal  side  (Burg,  1968). 

5.2.4  Accommodation 

Accommodation  is  the  focusing  of  the  lens  of  the  eye  through  muscle  movement.  As  humans  get  older,  their 
ability  (speed  and  accuracy)  to  accommodate  decreases  (Soderberg  et  al.,  1993).  For  instance,  the  time  to 
accommodate  between  infinity  to  10”  for  a  28  year-old  takes  .8  seconds  while  a  41  year-old  will  take  about 
2  seconds  (Kruger,  1980).  The  ability  to  rapidly  accommodate  appears  to  decline  at  the  age  of  30  and  those 
over  50  will  suffer  the  most.  Younger  humans  (under  the  age  of  20)  will  accommodate  faster  regardless  of 
target  size.  However,  the  ability  to  accommodate  may  begin  to  decline  as  early  as  age  1 0.  Accommodation 
for  binocular  viewing  is  both  faster  and  more  accurate  than  monocular  viewing  for  all  age  groups  (Fukuda 
et  al.,  1990).  The  Resting  Point  of  Accommodation  (RPA)  describes  the  accommodation  state  the  eye 
assumes  when  at  rest.  It  migrates  inward  over  time.  In  addition,  the  response  time  to  obtain  both  the  RPA 
and  far  point  focus  increase  over  time  (Roscoe,  1985).  Given  these  changes  a  WS  (Virtual  View  System) 
with  adjustable  focus  is  likely  to  lead  to  improved  product  usability. 

5.2.5  Sensitivity  to  Flicker 

Sensitivity  to  flicker  is  highest  when  the  eyes  are  light  adapted.  Thus  users  may  notice  flicker  in  the 
display  until  their  eyes  dark  adapt.  The  periphery  of  the  eye  is  also  more  sensitive  to  flicker  and  motion 
detection,  and  the  closer  an  object  is  to  the  eye,  the  more  likely  that  flicker  can  be  detected  (Kelly,  1969). 

5.2.6  Vision  Deficiencies 

There  are  a  wide  variety  of  visual  deficiencies  in  the  visual  system  that  may  occur  in  to  members  of  the 
general  population.  If  untreated,  these  may  lead  to  discomfort  when  using  visual  displays.  An  example  of 
the  most  common  of  these  problems  will  be  briefly  discussed  in  the  following. 

In  his  review  of  the  “Private  Eye”  viewing  device,  Peli  (1990)  reported  a  large  portion  of  the  discomfort 
associated  with  the  display  was  due  to  pre-existing  visual  conditions.  This  was  confirmed  by  Rosner  and 
Belkin  (1989)  who  recommend  a  complete  eye  exam  and  correction  for  existing  visual  problems  be 
undertaken  prior  to  using  a  display  system.  These  problems  will  become  more  prevalent  with  older  users. 
Visual  acuity  and  performance  decline  with  age.  People  in  their  20’s  tend  to  have  20/20  vision  on  average; 
younger  subject  may  have  20/15  vision.  With  progressing  age  visual  acuity  decreases  to  20/30  by  age  75 
(Owsley  et  al,  1983). 

It  is  estimated  that  3%  to  4%  of  the  general  population  suffer  from  strabismus,  which  describes  the 
inability  to  focus  both  eyes  to  the  same  single  point.  This  condition  usually  develops  before  the  age  of 
eight  and  is  hereditary  in  many  cases.  Patients  with  early,  untreated  strabismus  will  also  likely  develop 
amblyopia  (lazy  eye  phenomenon).  This  is  a  condition  in  which  one  eye  will  drift  while  the  other  remains 
focused  on  an  object.  Both  lead  impaired  depth  perception.  It  is  estimated,  that  approximately  2%  of  the 
general  population  suffer  from  it  (Peli,  1990). 

Phoria  is  the  tendency  for  a  covered  eye  to  deviate  from  the  fixation  point  of  the  open  eye.  While  these 
deviations  can  be  very  larger  even  after  only  several  hours  of  occlusion,  normal  vision  will  return  after 
only  1  minute  (Peli,  1990).  Phoria  can  cause  the  temporary  elimination  or  reduction  of  stereoscopic  depth 
perception  even  after  both  eyes  are  uncovered.  Additional  research  on  adults  has  shown  that  even  after 
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eight  days  of  one-eye  occlusion  subjects  were  able  to  regain  normal  vision  hours  after  both  eyes  were 
uncovered.  Measurable,  though  slight  phoria  was  found  to  exist  after  using  the  “Private  Eye”  monocular 
viewing  device  (Peli,  1990).  Changes  in  phoria  are  most  likely  to  occur  in  individuals  who  already  suffer 
from  uncorrected  visual  problems  (Saladin,  1988).  Half  of  patients  with  near-  or  far-sightedness  suffer 
from  additional  hyperphoria,  a  tendency  for  the  eyes  to  drift  upward.  This  also  affects  depth  perception. 

For  the  development  of  normal  binocular  vision,  each  eye  must  function  well  throughout  the  early 
development  years  during  childhood.  This  period  of  development  is  most  sensitive  to  disruption  up  to  age 
of  five  years  and  remains  critically  until  the  age  of  nine  years  when  the  visual  system  matures  (Peli,  1990). 
While  constant  use  of  a  visual  display  by  a  person  under  the  age  of  six  years  could  lead  to  visual  problems, 
it  is  doubtful  that  most  of  the  common  VR-displays  can  be  worn  comfortably  by  such  young  users.  Nor  is 
it  likely  that  they  could  use  such  a  display  long  enough.  In  addition,  common  AR-displays  are  often 
designed  as  see-through  device.  It  is  doubtful  that  they  will  attend  to  the  monocular  stimulus  for  a 
sufficient  amount  of  time  to  cause  permanent  damage. 

5.3  Audio  Requirements 

Although  it  is  no  question  that  visual  is  the  primary  modality  for  transferring  information  from  a 
computer,  practically  each  personal  computer  has  a  sound  card  today.  Audio  is  becoming  a  common  way 
of  presenting  additional  information.  Many  help  packages  for  software  have  an  audio  as  well  as  visual 
component.  Having  a  basic  understanding  of  human  hearing,  capabilities  and  limitations  also  helps  the 
designer  in  setting-up  audio  VR-components. 

Hearing  basically  involves  the  same  problems  as  seeing:  Perception  of  environmental  stimuli,  translating 
them  into  nerve  impulses,  and  combining  meaning  to  them  (Sutcliffe,  1989).  At  a  physical  level,  audio 
perception  is  based  on  sound  waves.  They  travel  as  longitudinal  waves  through  air  or  other  media.  Sound 
is  characterized  by  frequency  and  amplitude.  Frequency  determines  the  pitch  of  the  sound  and  amplitude 
determines  its  volume.  Frequency  is  measured  in  cycles  per  second  or  hertz,  with  1  cycle  per  second 
equaling  1  hertz.  Young  children  can  hear  in  the  range  of  about  20  Hz  to  over  15,000  Hz.  This  range 
decreases  with  age.  Audible  speech  is  between  260  and  5600  Hz  -  but  even  with  a  limited  range  between 
300  and  3000  Hz  communication  (telephone  transmission)  is  still  possible  (Sutcliffe,  1989).  Speech, 
as  well  as  most  everyday  sounds,  is  a  very  complex  mixture  of  frequencies. 

The  volume  or  intensity  of  a  sound  is  expressed  in  decibels  (dB).  This  is  a  logarithmic  expression  for  the 
ratio  between  the  amplitude  of  the  primary  sound  to  the  background  sound  and  gives  a  measurement  of  the 
ability  to  hear  what  is  intended.  A  whisper  is  20  dB.  Normal  speech  registers  between  50  and  70  dB. 
Hearing  loss  can  result  from  sounds  exceeding  140  dB  (Downton,  1991).  Below  20  dB  sounds  can  be 
heard,  but  they  are  not  distinguishable.  The  ear  cannot  determine  frequency  changes  below  this  level. 

More  important  for  acoustic  perception  than  physical  characteristics  of  sound  is  the  human  ability  to 
interpret  sound.  The  auditory  centre  of  the  cortex  appears  to  be  able  to  distinguish  three  different  types  of 
sound:  background  unimportant  sounds  (noise),  background  sounds  that  have  significance  (child’s  cry, 
dog’s  bark,  etc.)  and  speech  (Sutcliffe,  1989).  Fanguage  is  full  of  mispronounced  words,  unfinished 
sentences,  missing  words,  interruptions,  etc.,  but  the  brain  still  has  to  be  able  to  interpret  it.  This  seems  to 
be  done  by  comparison  to  past  experience  and  analyzed  as  a  stream.  The  same  sounds  can  therefore  be 
“heard”  differently  depending  on  the  context.  Speech  is  continuous.  When  analyzed,  it  doesn’t  appear  as 
disjointed  syllables  or  phonemes,  but  as  a  continuous  stream  that  must  be  interpreted  at  a  rate  of  between 
160  and  220  words  per  minute  (Sutcliffe,  1989). 

5.3.1  Sound  Perception 

There  are  several  auditory  localization  cues  to  help  locate  the  position  of  a  sound  source  in  space.  The  first 
is  the  interaural  time  difference.  This  means  the  time  delay  between  sounds  arriving  at  the  left  and  right 
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ears.  The  second  one  is  head  shadow.  It  defines  the  time  for  a  sound  to  go  through  or  around  the  head 
before  reaching  an  ear.  The  third  one  is  pinna  response.  It  is  the  effect  that  the  external  ear,  or  pinna, 
has  on  sound.  The  forth  one  refers  to  the  shoulder  echo.  It  describes  the  reflection  of  the  sound  in  the 
range  of  1  -  3  kHz  by  the  upper  torso  of  the  human  body. 

The  fifth  localization  cue  is  caused  by  movement  of  the  head.  It  helps  to  determine  a  location  of  a  sound 
source.  Another  one  is  the  occurrence  of  early  echo  response  in  the  first  50  -  100  ms  of  a  sounds  life. 
Further  reverberations  are  caused  by  reflections  from  surfaces  around.  The  final  cue  is  the  visual  modality, 
which  helps  us  to  quickly  locate  and  confirm  the  location  and  direction  of  a  sound. 

5.3.2  Sound  Processing 

VR  immersive  quality  can  be  enhanced  through  the  use  of  properly  cued,  realistic  sounds.  For  the  design 
of  a  VR  system  synthetic  sounds  have  to  be  generated  like  those  in  the  real  world.  Sound  processing 
includes  encoding  of  directional  localization  cues  on  several  audio  channels,  transmission  or  storage  of 
sound  in  a  certain  format  and  the  playback  of  sound. 

5. 3. 2. 1  Different  Types  of  Sounds 

Mono  sound: 

•  Recorded  with  one  microphone;  signals  are  the  same  for  both  ears. 

•  Sound  only  at  a  single  point  (“0”-dimensional),  no  perception  of  sound  position. 

Stereo  sound: 

•  Recorded  with  two  microphones  several  feet  apart  and  separated  by  empty  space;  signals  from 
each  microphone  enter  each  single  ear  respectively. 

•  Perceived  commonly  by  means  of  stereo  headphones  or  speakers;  typical  multimedia  configuration 
of  personal  computers. 

•  Gives  a  better  sense  of  the  sound’s  position  as  recorded  by  the  microphones,  but  only  varies  across 
one  axis  (1 -dimensional),  and  the  sound  sources  appear  to  be  at  a  position  inside  the  listener’s  head. 

Binaural  Sound: 

•  Recorded  in  a  manner  more  closely  to  the  human  acoustic  system:  by  microphones  embedded  in  a 
dummy  head. 

•  Sounds  more  realistic  (2-dimensional),  and  creates  sound  perception  external  to  the  listener’s 
head. 

•  Binaural  sound  was  the  most  common  approach  to  specialization;  the  use  of  headphones  takes 
advantage  of  the  lack  of  crosstalk  and  a  fixed  position  between  sound  source  (the  speaker  driver) 
and  the  ear. 

3D  Sound: 

•  Often  termed  as  spatial  sound,  is  sound  processed  to  give  the  listener  the  impression  of  a  sound 
source  within  a  three-dimensional  environment. 

•  New  technology  under  developing,  best  choice  for  VR  systems. 

•  The  definition  of  VR  requires  the  person  to  be  submerged  into  the  artificial  world  by  sound  as 
well  as  sight.  Simple  stereo  sound  and  reverb  is  not  convincing  enough,  particularly  for  sounds 
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coming  from  the  left,  right,  front,  behind,  over  or  under  the  person  -  360  degrees  both  azimuth 
and  elevation.  Hence,  3D-sound  was  developed. 


5. 3. 2. 2  3D  Sound  Synthesis 

3D  Sound  synthesis  is  a  signal  processing  system  reconstructs  the  localization  of  each  sound  source  and 
the  room  effect,  starting  from  individual  sound  signals  and  parameters  describing  the  sound  scene 
(position,  orientation,  directivity  of  each  source  and  acoustic  characterization  of  the  room  or  space). 

Sound  rendering  is  a  technique  that  creates  a  sound  world  by  attaching  a  characteristic  sound  to  each 
object  in  the  scene.  This  pipelined  process  consists  of  four  stages: 

1)  Generation  of  each  object’s  characteristic  sound  (recorded,  synthesized,  modal  analysis- 
collisions). 

2)  Sound  instantiation  and  attachment  to  moving  objects  within  the  scene. 

3)  Calculation  of  the  necessary  convolutions  to  describe  the  sound  source  interaction  within  the 
acoustic  environment. 

4)  Convolutions  are  applied  to  the  attached  instantiated  sound  sources. 

Its  similarity  to  ray-tracing  and  its  unique  approach  to  handling  reverberation  are  noteworthy  aspects, 
but  it  handles  the  simplicity  of  an  animated  world  that  is  not  necessarily  real-time. 

Modeling  the  human  acoustic  system  with  head-related  transfer  function  (HRTF)  is  another  approach. 
The  HRTF  is  a  linear  function  that  is  based  on  the  sound  source’s  position  and  takes  into  account  many  of 
the  cues  humans  use  to  localize  sounds.  Here,  the  process  works  as  follows: 

•  Record  sounds  with  tiny  probe  microphones  in  the  ears  of  a  real  person. 

•  Compare  the  recorded  sound  with  the  original  sounds  to  compute  the  person’s  HRTF. 

•  Use  HRTF  to  develop  pairs  of  finite  impulse  response  (FIR)  filters  for  specific  sound  positions. 

•  When  a  sound  is  placed  at  a  certain  position  in  virtual  space,  the  set  of  FIR  filters  that  correspond 
to  the  position  is  applied  to  the  incoming  sound,  yielding  spatial  sound. 

The  computations  are  so  demanding  that  they  currently  require  special  hardware  for  real-time 
performance. 

3D  sound  imaging  approximates  binaural  spatial  audio  through  the  interaction  of  a  3D  environment 
simulation.  First  the  line-of-sight  information  between  the  virtual  user  and  the  sound  sources  is  computed. 
Subsequently,  the  sounds  emitted  by  these  sources  will  be  processed  based  on  their  location,  using  some 
software  DSP  algorithms  or  simple  audio  effects  modules  with  delay,  filter  and  pan  and  reverb 
capabilities.  The  final  stereo  sound  sample  will  then  be  played  into  a  headphone  set  through  a  typical  user- 
end  sample  player,  according  to  the  user’s  position.  This  approach  is  suitable  for  simple  VE  systems 
where  a  sense  of  space  is  desired  rather  than  an  absolute  ability  to  locate  sound  sources. 

The  utilization  of  speaker  locations  works  with  strategically  placed  speakers  to  form  a  cube  of  any  size  to 
simulate  spatial  sound.  Two  speakers  are  located  in  each  comer  of  the  cube,  one  up  high  and  one  down 
low.  Pitch  and  volume  of  the  sampled  sounds  distributed  through  the  speakers  appropriately  give  the 
perception  of  a  sound  source’s  spatial  location.  This  method  has  less  accuracy  than  sound  yielded  by 
convolving  sound,  but  yields  an  effective  speedup  of  processing,  allowing  a  much  less  expensive  real-time 
spatial  sound. 
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5. 3.2. 3  Advantages  and  Problems 

Spatial  sound  facilitates  the  exploitation  of  spatial  auditory  cues  in  order  to  segregate  sounds  emanating 
from  different  directions.  It  increases  the  coherence  of  auditory  cues  with  those  conveyed  by  cognition  and 
other  perceptual  modalities.  This  way  of  sound  processing  is  a  key  factor  for  improving  the  legibility  and 
naturalness  of  a  virtual  scene  because  it  enriches  the  immersive  experience  and  creates  more  “sensual” 
interfaces.  A  3D  audio  display  can  enhance  multi-channel  communication  systems,  because  it  separates 
messages  from  one  another,  thereby  making  it  easier  for  the  operator  to  focus  on  selected  messages  only. 

However,  today  the  costs  for  high-end  acoustic  rendering  are  still  the  biggest  barrier  to  the  widespread  use 
of  spatial  audio.  Especially  exact  environmental  modeling  for  different  auditory  cues  is  extraordinarily 
expensive.  Common  problems  in  spatial  sound  generation  that  tend  to  reduce  immersion  are  front-to-back 
reversals,  intracranial  heard  sounds,  and  HRTF. 

Spatial  audio  systems  designed  for  the  use  with  headphones  may  result  in  certain  limitations  such  as 
inconvenience  of  wearing  some  sort  of  headset.  With  speakers,  the  spatial  audio  system  must  have 
knowledge  of  the  listener’s  position  and  orientation  with  respect  to  the  speakers.  And  as  auditory 
localization  is  still  not  fully  understood,  developers  cannot  make  effective  price/performance  decisions  in 
the  design  of  spatial  audio  systems. 

5.4  Haptic  Feedback 

Haptic  perception  relates  to  the  perception  of  touch  and  motion.  There  are  four  kinds  of  sensory  organs  in 
the  hairless  skin  of  the  human  hand  that  mediate  the  sense  of  touch.  These  are  the  Meissner’s  Corpuscles, 
Pacinian  Corpuscles,  Markel’s  Disks,  and  Ruffini  Endings.  As  shown  in  Table  2-2,  the  rate  of  adaptation 
of  these  receptors  to  a  stimulus,  location  within  the  skin,  mean  receptive  areas,  spatial  resolution,  response 
frequency  rate,  and  the  frequency  for  maximum  sensitivity  are,  at  least  partially,  understood.  The  delay 
time  of  these  receptors  ranges  from  about  50  to  500  msec. 
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Table  2-2:  Functional  Features  of  Cutaneous  Mechanoreceptors 


Feature 

Meissner 

Corpuscles 

Pacinian 

Corpuscles 

Merkel’s 

Disks 

Ruffini 

Endings 

Rate  of  adaptation 

Rapid 

Rapid 

Slow 

Slow 

Location 

Superficial 

dermis 

Dermis  and 
subcutaneous 

Basal 

epidermis 

Dermis  and 
subcutaneous 

Mean  receptive  area 

13  mm2 

101  mm2 

11  mm2 

59  mm2 

Spatial  resolution 

Poor 

Very  poor 

Good 

Fair 

Sensory  units 

43% 

13% 

25% 

19% 

Response  frequency 
range 

10-200  Hz 

70-  1000  Hz 

0.4-100  Hz 

0.4-100  Hz 

Min.  threshold  frequency 

40  Hz 

200  -  250  Hz 

50  Hz 

50  Hz 

Sensitive  to  temperature 

No 

Yes 

Yes 

>  100  Hz 

Spatial  summation 

Yes 

No 

No 

Unknown 

Temporal  summation 

Yes 

No 

No 

Yes 

Physical  parameter 
sensed 

Skin  curvature, 
velocity,  local 
shape,  flutter,  slip 

Vibration, 

slip, 

acceleration 

Skin  curvature, 
local  shape, 
pressure 

Skin  stretch, 
local  force 

It  is  important  to  notice  that  the  thresholds  of  different  receptors  overlap.  It  is  believed  that  the  perceptual 
qualities  of  touch  are  determined  by  the  combined  inputs  from  different  types  of  receptors.  The  receptors 
work  in  conjunction  to  create  an  operating  range  for  the  perception  of  vibration  that  extends  from  at  least 
0.04  to  greater  than  500  Hz  (Bolanowski  et  al.,  1988).  In  general,  the  thresholds  for  tactile  sensations  are 
reduced  with  increases  in  duration.  Skin  surface  temperature  can  also  affect  the  sensitivity  of  sensing 
tactile  sensations. 

These  details  provide  some  initial  guidance  for  the  design  and  evaluation  of  tactile  display  devices  in  such 
areas  as  stimulus  size,  duration  and  signal  frequency.  For  example,  Kontarinis  and  Howe  ( 1 995)  note  that 
the  receptive  areas  and  frequency  response  rates  indicate  that  a  single  vibratory  stimulus  for  a  fingertip  can 
be  used  to  present  vibration  information  for  frequencies  above  70  Hz,  whereas  an  array-type  display  might 
be  needed  for  the  presentation  of  lower  frequency  vibrations. 

Additional  information  is  available  when  looking  at  a  higher  level  that  the  receptors  just  discussed,  that  is, 
at  the  receptivity  of  the  skin  itself.  The  spatial  resolution  of  the  finger  pad  is  about  0.15  mm,  whereas  the 
two-point  limit  is  about  1  to  3  mm.  Detection  thresholds  for  features  on  a  smooth  glass  plate  have  been 
cited  as  2  mm  high  for  a  single  dot,  0.06  mm  high  for  a  grating,  and  0.85  mm  for  straight  lines. 
Researchers  have  also  looked  at  the  ability  to  detect  orientation.  The  threshold  for  detecting  the  direction 
of  a  straight  line  has  been  measured  at  16.8  mm.  When  orientation  is  based  on  the  position  of  two  separate 
dots,  the  threshold  was  8.7  mm  when  the  dots  were  presented  sequentially,  and  13.1  mm  when  presented 
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simultaneously.  Reynier  and  Hayward  (1993)  discuss  these  findings  and  the  results  of  additional  work  in 
this  area.  Data  on  the  temporal  acuity  of  the  tactile  sense  is  also  reported  by  the  authors,  who  note  that  two 
tactile  stimuli  (of  1  msec)  must  be  separated  by  at  least  5.5  msec  in  order  to  be  perceived  as  separate. 
In  general,  increases  in  tactile  stimulus  duration  can  lower  detection  thresholds. 

When  we  touch  an  object,  typically  both  the  tactile  and  kinesthetic  are  relevant  to  the  experience  (Heller, 
1991).  The  object  exerts  a  certain  pressure  on  our  hands  which  gives  a  sense  of  the  weight  and  texture  of 
the  object.  It  also  conveys  a  certain  temperature  to  our  hands  and  as  we  move  our  hands  above  the  object, 
our  kinesthetic  sense  gives  information  about  the  size  of  the  object.  Consequently,  there  are  three  basic 
forms  distinguishable:  The  vibro-tactile,  the  temperature,  and  the  kinesthetic  sense. 

The  skin  is  sensitive  to  numerous  forms  of  energy:  Pressure,  vibration,  electric  current,  cold  and  warmth. 
In  relation  to  display  technology,  by  far  the  majority  of  the  active  tactile  display  is  based  on  vibration. 
There  are  two  major  principles  to  generate  vibration:  Electrodes  attached  to  the  skin  and  mechanical 
vibration.  Although  both  techniques  are  quite  different,  psycho-physical  experiments  show  that  the 
characteristics  of  the  skin  are  the  same  for  both.  The  human  threshold  for  detection  of  vibration  at  about 
28  dB  (relative  to  1  mm  peak)  for  frequencies  in  the  range  0.4  -  3  Hz,  this  decreases  for  frequencies 
in  the  range  of  3  to  about  250  Hz  (at  the  rate  of  -5  dB/octave  for  the  range  3-30  Hz,  and  at  a  rate 
of  -  12  dB/octave  for  the  range  30  -  250  Hz),  for  higher  frequencies  the  threshold  then  increases 
(Shimoga,  1993b). 

The  perception  of  warmth  and  cold  is  another  sensation  modality.  The  human  skin  includes  separate 
receptors  for  warmth  and  cold,  hence  different  qualities  of  temperature  can  be  coded  primarily  by  the 
specific  receptors  activated.  However,  this  specificity  of  neural  activation  is  limited.  Cold  receptors 
respond  only  to  low  temperatures,  but  also  to  very  high  temperatures  (above  45°C).  Consequently,  a  very 
hot  stimulus  will  activate  both  warm  and  cold  receptors,  which  in  turn  evoke  a  hot  sensation. 

The  literature  also  provides  information  on  the  just-noticeable-difference  (JND)  for  changes  of 
temperatures.  Researchers  Yamitsky  and  Ochoa  (1991)  conducted  experiments  that  looked  at  the  JND  of 
temperature  change  on  the  palm  at  the  base  of  the  thumb.  They  found  that  two  different  measurement 
methods  gave  different  results,  and  the  difference  between  results  increased  as  the  rate  of  temperature 
change  increased.  Using  the  more  traditional  measurement  approach  based  on  a  method  of  levels,  and 
starting  at  a  baseline  temperature  of  32°C,  the  rate  of  temperature  change  (1.5,  4.2,  and  6.7°C/sec)  had  no 
detectable  effect  on  the  JND  for  warming  temperatures  (~0.47°)  or  cooling  temperatures  (~0.2°).  Subject 
reaction  time  was  independent  of  the  method  used,  and  also  independent  of  the  rate  of  temperature  change, 
although  the  reaction  time  for  increases  in  warming  (-0.7°)  was  significantly  longer  than  the  reaction  time 
for  increases  in  cooling  (~0.5°).  In  reviewing  work  in  this  area,  Zerkus  et  al.  (1995)  report  on  findings  that 
the  average  human  can  feel  a  temperature  change  as  little  as  0.1  °C  over  most  of  the  body,  though  at  the 
fingertip  a  sensitivity  of  1°C  is  typical.  He  also  states  that  the  human  comfort  zone  lies  in  the  region  of 
13  to  46°C.  LaMotte  (1978)  reports  that  the  threshold  of  pain  varies  from  36  to  47°C  depending  on  the 
locus  on  the  body,  stimulus  duration,  and  base  temperature. 

Most  of  the  research  on  kinesthetic  perception  has  been  focused  on  the  perceptions  of  exerted  force, 
limb  position  and  limb  movement.  The  kinesthetic  system  also  uses  the  signals  about  force,  position, 
and  movement  to  derive  information  about  other  mechanical  properties  of  objects  in  the  environment, 
such  as  stiffness  and  viscosity  (Jones,  1997).  Understanding  the  perceptual  resolution  of  the  kinesthetic 
system  for  such  object  properties  is  very  important  to  the  design  of  haptic  interfaces.  Here  is  an  overview 
of  the  results  of  studies  on  psychophysical  scaling  and  JNDs  for  several  parameters. 

The  subjective  level  of  force  increases  with  time  (Stevens,  1970;  Cain,  1971;  Cain,  1973).  The  JND  for 
force  is  about  7  %  (Jones,  1989;  Pang,  1991;  Tan,  1995).  The  JND  for  stiffness  (the  change  in  force 
divided  by  the  change  in  distance)  is  much  higher.  It  is  difficult  to  present  a  general  value  for  the  JND  of 
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stiffness,  since  the  different  studies  revealed  considerably  different  JNDs.  The  JNDs  reported  vary 
between  19  %  and  99  %  (Jones,  1990;  Roland,  1977).  The  JND  values  for  viscosity  (a  change  in  force 
divided  by  a  change  in  velocity,  expressed  in  Ns/m)  depend  on  the  reference  values.  For  small  values, 
the  JNDs  are  high:  83  %  at  2  Ns/m  to  48  %  at  16  Ns/m  (Jones,  1993).  For  higher  values,  the  JND  is  lower. 
Reported  values  range  from  9.5  to  34  %  (Jones,  1993;  Jones,  1997;  Beauregard,  1995;  Beauregard,  1997). 
Finally,  the  reported  JNDs  for  mass  (defined  as  the  ratio  of  applied  force  to  achieved  acceleration)  are 
relative  uniform  across  studies:  10  %  is  found  for  weights  of  50  g,  and  a  smaller  JND  for  weights  above 
100  g  (Ross,  1982;  Brodie,  1984;  Brodie,  1988;  Ross,  1987;  Darwood,  1991;  Hellstrom,  2000).  For  very 
heavy  weights,  the  JND  decreases  to  4  %  (Carlson,  1977). 

5.5  Olfactory  Feedback 

The  olfactory  system  has  been  researched  extensively  and  for  different  purposes.  The  entertainment 
industry  has  also  experimented  with  synthetic  smell  production,  in  the  form  of  accompanying  smells 
to  enhance  the  experience  of  films  (Lefcowitz,  2001,  Somerson,  2001).  In  the  Aroma  Rama  and  the 
Smell-o-vision  systems,  smells  were  released  in  cinema  theatres  in  certain  scenes  of  the  film.  In  the 
John  Waters  film  “Polyester”  in  1981,  the  audiences  were  given  “scratch  and  sniff’  cards  and  asked  to 
release  smell  at  certain  places  during  the  film.  These  experimental  systems  were  mainly  novelties  and  not 
very  successful,  with  reactions  from  the  audiences  reaching  from  allergic  reactions  to  nausea. 

Those  systems  were  all  manually  controlled,  and  the  scents  were  all  pre-produced.  With  respect  to  the 
inclusion  of  smell  in  the  user  interface,  it  only  becomes  interesting  when  the  production  of  smell  can  be 
computer  controlled  and  can  be  produced  based  on  a  computerized  descriptions  of  particular  smells. 
Then  it  will  be  possible  to  include  olfactory  displays  in  computer  systems.  For  smell  to  gain  acceptance 
among  audiences  there  are  many  more  factors  that  need  to  be  in  place,  such  as  natural  smelling  odors, 
non-allergenic  smells,  etc. 

The  main  idea  of  how  an  olfactory  display  would  work  is  that  the  user  has  a  peripheral  device  for  smell 
production.  This  device  is  connected  to  the  computer,  and  controlled  by  the  computer.  Using  codified 
descriptions  of  smell,  the  computer  can  signal  the  release  of  a  particular  smell.  A  specific  smell  is 
generated  by  mixing  a  set  of  primary  odors,  most  likely  in  the  form  of  oil-based  fragrances  (Bonsor,  2001; 
Cook,  1999). 
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